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The main problem that we will face in the data analysis for continuous gravitational-wave 
sources is processing of a very long time series and a very large parameter space. We present a 
number of analytic and numerical tools that can be useful in such a data analysis. These consist 
of methods to calculate false alarm probabilities, use of probabilistic algorithms, application of 
signal splitting, and accurate estimation of parameters by means of optimization algorithms. 



1 Introduction 

The interest in the data analysis for continuous gravitational-wave signals is growing. A prime 
example of a source of such signals is a spinning neutron star. These may be the first signals to 
look for in data streams from the long-arm laser interferometers and they are attractive sources 
for currently operating bar detectors. A number of theoretical papers were published on various 
data analysis schemes and a certain amount of real data were analyzed involving data from 
the prototypes of laser interferometers and from the bar detectors. Limited space precludes a 
review of available literature on this subject. The theoretical research clearly shows that data 
analysis will involve processing of very long time series and exploration of a very large parameter 
space. It turns out that optimal processing of month long data streams that we need to ensure 
a reasonable chance of detection of such signals is computationally prohibitive tl and methods 
to reduce the computational burden must be found. In this contribution we present a few data 
analysis tools that can be useful in processing data for continuous signals and that were a subject 
of our recent research. We do not offer a complete end-to-end data analysis scheme. 



2 Data analysis tools for continuous signals 



2. 1 Significance of detected events 

We assume that the noise n in the detector is an additive, stationary, Gaussian, and zero-mean 
continuous in time t random process. Then the data x (if the signal h is present) can be written 
as 

x = n + h. (1) 



We have shown au that the gravitational- wave signal from a spinning neutron star can be ap- 
proximated by the following signal 



h(t; h , $ , £) = K sin [*(t; £) + $ ] , (2) 



where 
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and where T is the observation time, Res = 1AU is the mean distance from the Earth's center 
to the SSB, Re is the mean radius of the Earth, Q Q is the mean orbital angular velocity of 
the Earth, 4> is a deterministic phase which defines the position of Earth in its orbital motion 
at t = 0, e is the angle between the ecliptic and the Earth's equator, A is detector's latitude. 
The vector £ = (ai, fo, ■ ■ ■ , / s ), so the phase $ depends on s + 3 parameters, h Q and <£ are 
respectively constant amplitude and phase. The parameters a% and a 2 are defined by 

a i := fo (cos e sin a cos (5 + sine sin (5) , a 2 := fo cos a cos S, (4) 

where a is right ascention and 8 is declination of the source. 

From the maximum likelihood principle we find that in order to detect the signal we need 
to compute the following statistics 

H(i) = ^rAF\\ (5) 
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and where Sh{fo) is the one-sided spectral density at frequency fo. We assume that we can 
neglect variations of over the bandwidth of the signal. The statistics J r (^) is a certain 
generalized multiparameter random process called the random fieldQ. If the vector £ is one- 
dimensional the random field is simply a random process. When the signal is absent the above 
random field T is homogeneous i.e. its mean m is constant and the autocovariance function C 
depends only on the difference r := £ — 



C(£,0 := E{[T{i)-m{i)][Hi')-m{i')]} = C{T). (7) 

The homogeneity of the random field is a direct consequence of the linearity of the phase of the 
signal in parameters. Moreover one can show that for the case of a Gaussian noise the field T is 
a (generalized) x 2 fieldS. To assess the significance of events we need to calculate the probability 
the statistics J- crosses a certain threshold when the data is only noisei which is called the false 
alarm probability. We have developed two approaches to calculate this probability: one based 
on the concept of an elementary cell of the parameter space and the other on the geometry of 
the random fields. 



Cells 



The correlation function C(r) has a maximum at r = and we can define the characteristic 
correlation hypersurface by the equation 

C(r) = lc(0). (8) 

The volume of this hypersurface defines an elementary cell in the parameter space and we can 
calculate the number of cells N c from the formula 

T/total 

N c = y^T- (9) 

where Vtotai is the volume of the parameter space. We approximate the probability distribution 
JF(£) in each cell by probability po(J-) when the signal parameters are known (in our case 
PoiJ 7 ) = exp(— J 7 )). The values of the statistics T in each cell can be considered as independent 
random variables. The probability that T does not exceed the threshold J- Q in a given cell is 
1 — exp(— J- Q ). Consequently the probability that T does not exceed the threshold T Q in all the 
N c cells is [1 — exp(— J- )] . The probability Pp that T exceeds T Q in one or more cell is given 
by 

p-(JF ) = l-[l-exp(-^ )] JVc . (10) 
This is the false alarm probability when the phase parameters are unknown. 

Geometry of random fields 

The second approach to calculate the false alarm probability involves calculation of the number 
of threshold crossings by the statistics T. We need to count somehow the number of times a 
random field crosses a fixed hypersurface. Let J-(£) be M-dimensional homogeneous real-valued 
random field where parameters £ = . . . , £m) belong to M-dimensional Euclidean space 
and let K be a compact subset of M. M . We define the excursion set of J~(£) inside K above the 
level T as 

A T {T ,K):={i^K:T>To\. (11) 

It was found i that when the excursion set does not intersect the boundary of the set K then 
a suitable analogue of the mean number of level crossings is the expectation value of the Euler 
characteristic x °f the set Ayr. For simplicity we shall denote xlA-ri^o-, K)] by XT a - It turns 
out that using the Morse theory the expectation values of the Euler characteristic of Ajr can 
be given in terms of certain multidimensional integrals § and close form formulae could be 
obtained for homogeneous M-dimensional Gaussian fields and 2-dimensional x 2 fieldsu. Recently 
Worsley I obtained explicit formulae for M-dimensional homogeneous x 2 field. One finds that 
asymptotically (i.e. for large thresholds) the false alarm probability can be approximated by 

p9 ~ (-^0/2) pC / 10 v 

Fp - r(M/2 + ir F ' {LZ) 



where Pp is given by Eq. 10. Our numerical simulations have shown that the formula for 
the false alarm probability based on the cell concept (Eq. |K]) applies when the statistics T 
is coarsely sampled by means of FFT whereas formula based on the expectation of the Euler 



characteristic (Eq. |12| ) applies when the statistics is finely sampled. Using Eq. 1C We have 
found that for observation time T Q of 7 days and extremal values of spindowns estimated form 
\fk\max = fc![ T" x I where maximum frequency f max is 1kHz and minimum spindown age T m i n 
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is taken as 10 3 yr the number of cells is 10 16 what corresponds to 10 Teraflop computing power 



when we want to keep the data processing rate equal to data acquisition rate. In numerical 
calculation we expand the left hand side of Eq. || in Taylor series keeping only terms to the 2nd 
order. This gives approximation of the hypersurface of correlation by a hyperellipse. 

It is interesting to note that the above approach has been used in the search for significant 
signals in the cosmic microwave background mapsQ and brain tomography images 0. 



2.2 Probabilistic algorithms 

Our estimate at the end of the previous section shows that the number of independent cells to 
search for continuous signals is very large and computing power required huge. However in data 
analysis for gravitational-wave signals we would first like to know whether any gravitational- 
waves are present. Our primary objective is to detect the first gravitational-wave signal. To 
do so one does not need to search the whole parameter space. A related problem occurs in 
testing whether a certain very large number is a prime. Primes are very important in designing 
encryption schemes. Here a brilliant solution in the form of Miller-Rabin test was found with 
the help of probabilistic algorithms. We want to test that a certain very large natural number 
n is a prime. We choose randomly a natural number bi less than n. We then perform the Miller 
test which shows for some numbers bi with certainty that n is not a prime. If test is passed bi is 
called witness of nonprimality. Then by lemma due to Rabin we know that if n is not a prime 
at least 1/2 of the positive integers between 1 and n are witnesses. Thus probability Ppthat n 
is a prime after m random choices of bi is > 1 — l/2 m and thus for very modest values of m 
is very high. Of course we only know that n is a prime with a certain (very high) probability. 
However this is good enough for many applications. In the case of signal detection in noise we 
can only detect the signal with the certain probability thus use of probabilistic algorithms is 
not a limitation. However here we are only able to offer a very naive application of probabilistic 
algorithms: choose the values of parameters of the templates randomly from the parameter space 
instead of searching systematically the parameter space. If we have N c cells in the parameter 
space and there are signals in Ne cells the probability of detection of the signal after m random 
choices of a cell to search is given by 

pd = i - a - ^r- (13) 

Assuming that ^ < < 1 we find that we need random choices of cells in order that detection 
probability is > 90%. This can lead to a substantial reduction of the computational cost to detect 
the first signal with respect to the cost to search the whole parameters space. For example for 
the case given at the end of the previous section if the number of signals is 10 3 the computational 
power decreases to a manageable amount of 10 Gflops. 



2.3 Signal splitting 

In this section we would like to discuss a technique that can distribute the data processing 
into several smaller computers. We shall call this technique signal splitting. We can divide 
the available bandwidth of the detector (/ m i n , / max ) into M adjacent intervals of length B. We 
then apply a standard technique of heterodyning. For each of the chosen bands we multiply 
our data time series by exp(— 2-irifi), where // = f m in + IB, (I = 0, ... ,M — 1). Such an 
operation moves the spectrum of the data towards zero by frequency fj. We then apply a low 
pass filter with a cutoff frequency B and we resample the resulting sequence with the frequency 
2B. The result is M time series sampled at frequency 2B instead of one sampled at 2/ max . 
The resampled sequences are shorter than the original ones by a factor of M and each can be 
processed by a separate computer. We only need to perform the signal splitting operation once 
before the signal search. The splitting operation can also be performed continuously when the 



data are collected so that there is no delay in the analysis. The signal splitting does not lead to 
a substantial saving in the total computational power but yields shorter data sequences for the 
analysis. For example for the case of 7 days of observation time and sampling rate of 1 kHz the 
data itself would occupy around 10 GB of memory (assuming double precision) which is available 
on expensive supercomputers whereas if we split the data into a bandwidth of 50 Hz so that 
sampling frequency is only 100 Hz each sequence will occupy 0.5 GB memory which is available 
on inexpensive personal computers. Analysis of these short sequences can be distributed over a 
large number of small computers with no need of communication between the processors. 

2.4 Fine search 

To estimate the signal parameters we need to find the global maximum of the statistics T '. 
We propose an algorithm that consists of two parts: a coarse search and a fine search. The 
coarse search involves calculation of T on an appropriate grid in parameter space and finding 
the maximum value of T on the grid and the values of the parameters of the signal that give 
the maximum. This gives coarse estimators of the parameters. Fine search involves finding the 
maximum of T using fast maximization routines with the starting values determined from the 
coarse estimates of the parameters. The grid of the coarse search is determined by the region of 
convergence of the maximization routine. We have tested extensively by means of Monte Carlo 
simulation one such maximization algorithm - the Nelder-Mead simplex method!. This method 
applies to function of arbitrary number of parameters and does not involve calculation of the 
derivatives of the function. In Figure 1 we illustrate our method for the case of a signal with just 
one spindown parameter f\ so that we need to find the maximum of J- with respect to frequency 
/o and fi . The use of the maximization procedure leads to an increase in the significance of the 
detected events and an improved accuracy of the estimation of the parameters of the signal. It 
does not involve an increase in the complexity of the computation with respect to a one-step 
search over a grid of templates. The complexity is still of the order of Np (N log 2 N + L), where 
Np is the number of templates in the grid, N is the number of points in the data sequence (we 
assume we use FFT to calculate our statistics) and L is some small number. 
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Figure 1: A two-step procedure to find the maximum of the statistics T to estimate the parameters of the signal. A 
search over a coarse grid gives a maximum marked by a circle and a fine search using the Nelder-Mead algorithms 
gives the maximum marked by a star. The true maximum is marked by a diamond. The upper plot gives the 
magnitude of the FFT for data before filtering (dotted line) and after filtering (continuous line) implied in the 
definition of T . The lower plot is a contour plot of the statistics T as a function of frequency /o and spindown 
parameter /i. The steps of the simplex algorithm are displayed. 



